Skip to content

Conversation

@hannahkm
Copy link
Contributor

@hannahkm hannahkm commented Sep 5, 2025

What does this PR do?

Implements the v1 trace protocol. Includes commits cherry-picked from #3932 since I closed my previous PR.

The code is DISABLED by default. In the future, v1.0 will replace v0.4 as the default protocol.

Motivation

https://datadoghq.atlassian.net/browse/LANGPLAT-750
RFC

Reviewer's Checklist

  • Changed code has unit tests for its functionality at or near 100% coverage.
  • System-Tests covering this feature have been added and enabled with the va.b.c-dev version tag.
  • There is a benchmark for any new code, or changes to existing code.
  • If this interacts with the agent in a new way, a system test has been added.
  • New code is free of linting errors. You can check this by running ./scripts/lint.sh locally.
  • Add an appropriate team label so this PR gets put in the right place for the release notes.
  • Non-trivial go.mod changes, e.g. adding new modules, are reviewed by @DataDog/dd-trace-go-guild.

Unsure? Have a question? Request a review!

hannahkm and others added 3 commits September 5, 2025 15:06
- Added msgp struct tags to fields in payloadV1 and traceChunk for serialization.
- Introduced EncodeMsg methods for payloadV1 and spanListV1 to support msgp encoding.
- Updated field types in traceChunk to enhance compatibility with msgp serialization.
@pr-commenter
Copy link

pr-commenter bot commented Sep 5, 2025

Benchmarks

Benchmark execution time: 2025-10-29 18:55:12

Comparing candidate commit 6e71dbd in PR branch hannahkm/implement-v1-serialization with baseline commit abd2718 in branch main.

Found 1 performance improvements and 0 performance regressions! Performance is the same for 23 metrics, 0 unstable metrics.

scenario:BenchmarkParallelMetrics/distribution/get-handle-24

  • 🟩 execution_time [-3.562ns; -1.416ns] or [-5.189%; -2.062%]

@datadog-official
Copy link
Contributor

datadog-official bot commented Sep 5, 2025

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 6e71dbd | Docs | Datadog PR Page | Was this helpful? Give us feedback!

@hannahkm hannahkm requested review from a team as code owners October 27, 2025 19:11
func checkEndpoint(c *http.Client, endpoint string, protocol float64) error {
b := []byte{0x90} // empty array
if protocol == traceProtocolV1 {
b = []byte{0x80} // empty map
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Payload v1 is represented by a message pack map, whereas the empty payload in v0.4 was represented by an array. To prevent failures when we send empty data, we need to check for the payload version and send the correct data type.

// • https://github.com/DataDog/dd-trace-go/pull/475
// • https://github.com/DataDog/dd-trace-go/pull/549
// • https://github.com/DataDog/dd-trace-go/pull/976
type payloadV04 struct {
Copy link
Contributor Author

@hannahkm hannahkm Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity, the old payload (previously named payload and living in payload.go) is named payloadV04 to represent v0.4. The associated functions are moved to payload_v04.go, and v1 payload functionality is added to payload_v1.go.

Copy link
Contributor

@mtoffl01 mtoffl01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some feedback for improvements, but conceptually looks good

Comment on lines +596 to +603
if internal.BoolEnv("DD_TRACE_V1_PAYLOAD_FORMAT_ENABLED", false) {
c.traceProtocol = traceProtocolV1
if t, ok := c.transport.(*httpTransport); ok {
t.traceURL = fmt.Sprintf("%s%s", c.agentURL.String(), tracesAPIPathV1)
}
} else {
c.traceProtocol = traceProtocolV04
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems weird that we're modifying httpTransport.traceURL directly for the traceProtocolV1 case, but not for other cases.

Maybe we can modify the newHTTPTransport method to consider tracesAPIPathV1 as well, and move these if/else checks above where we set c.transport.

Like this (pseudocode):

if (DD_TRACE_V1_PAYLOAD_FORMAT_ENABLED) { 
  c.traceProtocol = traceProtocolV1 
} else {
  c.traceProtocol = traceProtocolV04
}
if (c.transport == nil) { c.transport = newHTTPTransport(c.agentURL.String(), c.httpClient, c.traceProtocol }

Comment on lines -251 to +77
// safePayload provides a thread-safe wrapper around unsafePayload.
// safePayload provides a thread-safe wrapper around payload.
type safePayload struct {
mu sync.RWMutex
p *unsafePayload
p payload
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm slightly confused by *unsafePayload becoming payload. v04 payload was always "unsafe," and v1 payload is also marked as "not safe for concurrent use"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtoffl01 payload is an interface so we can reuse safePayload with the unsafe payloads for both trace protocols.

if p, ok := span.Context().SamplingPriority(); ok {
origin = span.Context().origin // TODO(darccio): are we sure that origin will be shared across all the spans in the chunk?
priority = p // TODO(darccio): the same goes for priority.
dm := span.context.trace.propagatingTag(keyDecisionMaker)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible any of these fields could be nil and we could hit an invalid memory address error 😄

@hannahkm
Copy link
Contributor Author

/merge

@dd-devflow-routing-codex
Copy link

dd-devflow-routing-codex bot commented Oct 29, 2025

View all feedbacks in Devflow UI.

2025-10-29 18:55:34 UTC ℹ️ Start processing command /merge


2025-10-29 18:55:45 UTC ℹ️ MergeQueue: waiting for PR to be ready

This merge request is not mergeable according to GitHub. Common reasons include pending required checks, missing approvals, or merge conflicts — but it could also be blocked by other repository rules or settings.
It will be added to the queue as soon as checks pass and/or get approvals.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.


2025-10-29 19:04:27 UTC ℹ️ MergeQueue: merge request added to the queue

The expected merge time in main is approximately 19m (p90).


2025-10-29 19:20:20 UTC ℹ️ MergeQueue: This merge request was merged

@dd-mergequeue dd-mergequeue bot merged commit 26c62ab into main Oct 29, 2025
245 checks passed
@dd-mergequeue dd-mergequeue bot deleted the hannahkm/implement-v1-serialization branch October 29, 2025 19:20
hannahkm added a commit that referenced this pull request Oct 29, 2025
Co-authored-by: darccio <[email protected]>
Co-authored-by: hannahs.kim <[email protected]>
func DecodeSpanEvents([]byte, *stringTable) ([]spanEvent, []byte, error)
func DecodeSpanLinks([]byte, *stringTable) ([]SpanLink, []byte, error)
func DecodeSpans([]byte, *stringTable) (spanList, []byte, error)
func DecodeTraceChunks([]byte, *stringTable) ([]traceChunk, []byte, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we adding these public APIs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good catch. @hannahkm Do we need them? I think everything happens inside the same package, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. We don't need them to be public, as we aren't (and probably shouldn't) be calling this functions from anywhere else.

e-n-0 pushed a commit that referenced this pull request Oct 30, 2025
@darccio darccio mentioned this pull request Nov 21, 2025
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants